268 research outputs found

    Position score weighting technique for mining web content outliers.

    Get PDF
    The existing mining web content outlier methods used stemming algorithm to preprocess the web documents and leave the domain dictionary in their root words. The stemming algorithm was usually used to reduce derived words to their stem, base or root form. The stemming algorithm sometimes does not leave a real word after removing the stem and it caused a problem to match words in the full word profile with the domain dictionary. Therefore this study uses stemmed domain dictionary and applies it with Term Frequency with Position Score (TF.PS) weighting technique which is derived from TF.IDF weighting technique from Information Retrieval (IR) in dissimilarity measure phase to see the efficiency of these technique for determining the outliers in the web content. The dataset is from The 20 Newsgroups Dataset. The result for stemmed domain dictionary with TF.PS weighting technique achieves up to 98.19% of accuracy and 90% of F1-Measure which is higher than previous techniques

    First discovery augmented reality for learning solar systems

    Get PDF
    The development of Augmented Reality (AR) systems in educational settings should be given more attention and recognition on its contribution to the evolution of education. Although this shift of pedagogical method may disrupt the traditional curriculum model, it also offers great opportunity to complement and improve the modern age education model. This paper presents an AR-based mobile application for exploring Space and Science for primary school students called the First Discovery (FD). This application supplements a traditional book that contains 10 target images for solar system and its planets, which can be scanned by the AR camera in FD application. Evaluation was carried out among primary school children, elementary educators as well as parents, which showed a highly favorable response. It is hoped that the proposed FD application is able to improve the ability of children in retaining knowledge after the AR science learning experience, to enhance information accessibility of the science learning content for children as well as to develop creative learning and the ability of children in exploring and problem solvin

    Classic term weighting technique for mining web content outliers

    Get PDF
    Outlier analysis has become a popular topic in the field of data mining but there have been less work on how to detect outliers in web content. Mining Web Content Outliers is used to detect irrelevant web content within a web portal. Term Frequency (TF) techniques from Information Retrieval (IR) have been used to detect the relevancy of a term in a web document. However, when document length varies, relative frequency is preferred. This study used maximum frequency normalization and applied Inverse Document Frequency (IDF) weighting technique which is a traditional term weighting method in IR to use the value of less frequent terms among documents which are considered as more discriminative than frequent terms. The dataset is from The 20 Newsgroups Dataset. TF.IDF is used in dissimilarity measure and the result achieves up to 91.10% of accuracy, which is about 17.77% higher than the previous technique

    Customer profiling using classification approach for bank telemarketing

    Get PDF
    Telemarketing is a type of direct marketing where a salesperson contacts the customers to sell products or services over the phone. The database of prospective customers comes from direct marketing database. It is important for the company to predict the set of customers with highest probability to accept the sales or offer based on their personal characteristics or behaviour during shopping. Recently, companies have started to resort to data mining approaches for customer profiling. This project focuses on helping banks to increase the accuracy of their customer profiling through classification as well as identifying a group of customers who have a high probability to subscribe to a long-term deposit. In the experiments, three classification algorithms are used, which are Naïve Bayes, Random Forest, and Decision Tree. The experiments measured accuracy percentage, precision and recall rates and showed that classification is useful for predicting customer profiles and increasing telemarketing sales

    Comparative analysis of text classification algorithms for automated labelling of quranic verses

    Get PDF
    The ultimate goal of labelling a Quranic verse is to determine its corresponding theme. However, the existing Quranic verse labelling approach is primarily depending on the availability of Quranic scholars who have expertise in Arabic language and Tafseer. In this paper, we propose to automate the labelling task of the Quranic verse using text classification algorithms. We applied three text classification algorithms namely, k-Nearest Neighbour, Support Vector Machine, and Naïve Bayes in automating the labelling procedure. In our experiment with the classification algorithms English translation of the verses are presented as features. The English translation of the verses are then classified as “Shahadah” (the first pillar of Islam) or “Pray” (the second pillar of Islam). It is found that all of the text classification algorithms are capable to achieve more than 70% accuracy in labelling the Quranic verses

    Bayesian approach to classification of football match outcome

    Get PDF
    The football match outcome prediction particularly has gained popularity in recent years. It attract lots type of fan from the analyst expert, managerial of football team and others to predict the football match result before the match start.There are three types of approaches had been proposed to predict win, lose or draw; and evaluate the attributes of the football team. The approaches are statistical approach, machine learningapproach and Bayesian approach. This paper propose the Bayesian approaches within machine learning approaches such as Naive Bayes (NB), Tree Augmented Naive Bayes (TAN) and General Bayesian Network (K2) to predict the football match outcome. The required of football data is the English Premier League match results for three seasons; 2016 – 2017, 2015 – 2016 and 2014 – 2015 downloaded from http://www.football-data.co.uk. The experimental results showed that TAN achieved the highest predictive accuracy of 90.0 % in average across three seasons among others Bayesian approach (K2 and NB). The result from this research is hope that it can be used in future research for predicting the football match outcome

    Formulating layered adjustable autonomy for unmanned aerial vehicles

    Get PDF
    Purpose - In this paper, we propose a Layered Adjustable Autonomy (LAA) as a dynamically adjustable autonomy model for a multi-agent system. It is mainly used to efficiently manage humans and agents share control of autonomous systems and maintain humans’ global control over the agents. Design/Methodology/Approach - We apply the LAA model in an agent-based autonomous Unmanned Arial Vehicle (UAV) system. The UAV system implementation consists of two parts, software, and hardware. The software part represents the controller and the cognitive and the hardware represents the computing machinery and the actuator of the UAV system. The UAV system performs three experimental scenarios of dance, surveillance and search missions. The selected scenarios demonstrate different behaviors in order to create a suitable test plan and ensure significant results. Findings - The results of the UAV system tests prove that segregating the autonomy of a system as multidimensional and adjustable layers enables humans and/or agents to perform actions in a convenient autonomy levels. Hence, reducing the adjustable autonomy drawbacks of constraining the autonomy of the agents, increasing humans’ workload and exposing the system to disturbances. Originality/value - The application of the LAA model in a UAV manifests the significance of implementing dynamic adjustable autonomy. Assessing the autonomy within three phases of agents run cycle (task-selection, actions-selection, actions-execution) is an original idea that aims to direct agents’ autonomy towards performance competency. The agents’ abilities are well exploited when an incompetent agent switches with a more competent on

    Questionify gamification in education

    Get PDF
    In the education industry, lecturers are finding ways to improve students’ concentrations and grades by using smart devices to track students’ assignment or tutorial progress. One of the few possible and attractive solution is by using the gamification technique. This paper proposes an educational application called Questionify that implements the gamification elements and allow users to collect points, gain achievements, increase motivation and engagement towards students’ coursework in Software Engineering subject. Questionify is developed using C# and Java language has been evaluated using questionnaire among 24 respondents. The findings showed that the respondents believe that gamification can do better in education as compare to the traditional method of teaching the students. In the future, this gamification approach will be tested on more technical subjects such as programming and networking subjects to help students engage in a different learning approach

    Machine learning approach for flood risks prediction

    Get PDF
    Flood is one of main natural disaster that happens all around the globe caused law of nature. It has caused vast destruction of huge amount of properties, livestock and even loss of life. Therefore, the needs to develop an accurate and efficient flood risk prediction as an early warning system is highly essential. This study aims to develop a predictive modelling follow Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology by using Bayesian network (BN) and other Machine Learning (ML) techniques such as Decision Tree (DT), k-Nearest Neighbours (kNN) and Support Vector Machine (SVM) for flood risks prediction in Kuala Krai, Kelantan, Malaysia. The data is sourced from 5-year period between 2012 until 2016 consisting 1,827 observations. The performance of each models were compared in terms of accuracy, precision, recall and f-measure. The results showed that DT with SMOTE method performed the best compared to others by achieving 99.92% accuracy. Also, SMOTE method is found highly effective in dealing with imbalance dataset. Thus, it is hoped that the finding of this research may assist the non-government or government organization to take preventive action on flood phenomenon that commonly occurs in Malaysia due to the wet climate

    Classification-and-Ranking Architecture Based on Intentions for Response Generation Systems

    Get PDF
    Existing response generation accounts only concern with generation of words into sentences, either by means of grammar or statistical distribution. While the resulting utterance may be inarguably sophisticated, the impact may be not as forceful. We believe that the design for response generation requires more than grammar rules or some statistical distributions, but more intuitive in the sense that the response robustly satisfies the intention of input utterance. At the same time the response must maintain coherence and relevance, regardless of the surface presentation. This means that response generation is constrained by the content of intentions, rather than the lexicons and grammar. Statistical techniques, mainly the over generation-and-ranking architecture works well in written language where sentence is the basic unit. However, in spoken language where utterance is the basic unit, the disadvantage becomes critical as spoken language also render intentions, hence short strings may be of equivalent impact. The bias towards shortstrings during ranking is the very limitation of this approach hence leading to our proposed intention-based classification-and-ranking architecture. In this architecture, response is deliberately chosen from dialogue corpus rather than wholly generated, such that it allows short ungrammatical utterances as long as they satisfy the intended meaning of input utterance. The architecture employs two basic components, which is a Bayesian classifier to classify user utterances into response classes based on their pragmatic interpretations, and an Entropic ranker that scores the candidate response utterances according to the semantic content relevant to the user utterance. The high-level, pragmatic knowledge in user utterances are used as features in Bayesian classification to constrain response utterance according to their contextual contributions, therefore, guiding our Maximum Entropy ranking process to find one single response utterance that is most relevant to the input utterance. The proposed architecture is tested on a mixed-initiative, transaction dialogue corpus of 64 conversations in theater information and reservation system. We measure the output of the intention-based response generation based on coherence of the response against the input utterance in the test set. We also tested the architecture on the second body of corpus in emergency planning to warrant the portability of architecture to cross domains. In the essence, intention-based response generation performs better as compared to surface generation because features used in the architecture extend well into pragmatics, beyond the linguistic forms and semantic interpretations
    corecore